nlp_architect.models.chunker.SequenceChunker

class nlp_architect.models.chunker.SequenceChunker(use_cudnn=False)[source]

A sequence Chunker model written in Tensorflow (and Keras) based SequenceTagger model. The model uses only the chunking output of the model.

__init__(use_cudnn=False)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([use_cudnn]) Initialize self.
build(vocabulary_size, num_pos_labels, …) Build a chunker/POS model
fit(x, y[, batch_size, epochs, …]) Fit provided X and Y on built model
load(filepath) Load model from disk
load_embedding_weights(weights) Load word embedding weights into the model embedding layer
predict(x[, batch_size]) Predict labels given x.
save(filepath) Save the model to disk
build(vocabulary_size, num_pos_labels, num_chunk_labels, char_vocab_size=None, max_word_len=25, feature_size=100, dropout=0.5, classifier='softmax', optimizer=None)

Build a chunker/POS model

Parameters:
  • vocabulary_size (int) – the size of the input vocabulary
  • num_pos_labels (int) – the size of of POS labels
  • num_chunk_labels (int) – the sie of chunk labels
  • char_vocab_size (int, optional) – character vocabulary size
  • max_word_len (int, optional) – max characters in a word
  • feature_size (int, optional) – feature size - determines the embedding/LSTM layer hidden state size
  • dropout (float, optional) – dropout rate
  • classifier (str, optional) – classifier layer, ‘softmax’ for softmax or ‘crf’ for conditional random fields classifier. default is ‘softmax’.
  • optimizer (tensorflow.python.training.optimizer.Optimizer, optional) – optimizer, if None will use default SGD (paper setup)
fit(x, y, batch_size=1, epochs=1, validation_data=None, callbacks=None)

Fit provided X and Y on built model

Parameters:
  • x – x samples
  • y – y samples
  • batch_size (int, optional) – batch size per sample
  • epochs (int, optional) – number of epochs to run before ending training process
  • validation_data (optional) – x and y samples to validate at the end of the epoch
  • callbacks (optional) – additional callbacks to run with fitting
load(filepath)

Load model from disk

Parameters:filepath (str) – file name of model
load_embedding_weights(weights)

Load word embedding weights into the model embedding layer

Parameters:weights (numpy.ndarray) – 2D matrix of word weights
predict(x, batch_size=1)[source]

Predict labels given x.

Parameters:
  • x – samples for inference
  • batch_size (int, optional) – forward pass batch size
Returns:

tuple of numpy arrays of chunk labels

save(filepath)

Save the model to disk

Parameters:filepath (str) – file name to save model